Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU

نویسندگان

  • Sergiy Gogolenko
  • Zhaojun Bai
  • Richard Scalettar
چکیده

We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block pcyclic matrices. We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF based inversion algorithm attains up to 90% of DGEMM performance on hybrid CPU+GPU systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators

We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices. We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSO...

متن کامل

Block-Relaxation Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs

Block iterative methods are extremely important as smoothers for multigrid methods, as preconditioners for Krylov methods, and as solvers for diagonally dominant linear systems. Developing robust and efficient smoother algorithms suitable for current and evolving GPU and multicore CPU systems is a significant challenge. We address this issue in the case of constant-coefficient stencils arising ...

متن کامل

Structured Condition Numbers for Invariant Subspaces

Invariant subspaces of structured matrices are sometimes better conditioned with respect to structured perturbations than with respect to general perturbations. Sometimes they are not. This paper proposes an appropriate condition number cS, for invariant subspaces subject to structured perturbations. Several examples compare cS with the unstructured condition number. The examples include block ...

متن کامل

GPGPU parallel algorithms for structured-grid CFD codes

A new high-performance general-purpose graphics processing unit (GPGPU) computational fluid dynamics (CFD) library is introduced for use with structured-grid CFD algorithms. A novel set of parallel tridiagonal matrix solvers, implemented in CUDA, is included for use with structured-grid CFD algorithms. The solver library supports both scalar and block-tridiagonal matrices suitable for approxima...

متن کامل

Solving a large scale radiosity problem on GPU-based parallel computers

The radiosity equation has been usedwidely in computer graphics and thermal engineering applications. The equation is simple to formulate but is challenging to solve when the number of Lambertian surfaces associatedwith an application becomes large. In this paper, we present the algorithms to compute the view factors and solve the set of radiosity equations using an out-of-core Cholesky decompo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014